prioritized path
Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search
One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance. However, weight sharing across models has an inherent deficiency, i.e., insufficient training of subnetworks in the hypernetwork. To alleviate this problem, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training.
- Asia > China > Hong Kong (0.04)
- North America > Canada (0.04)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
d072677d210ac4c03ba046120f0802ec-AuthorFeedback.pdf
We respond to the concerns point-by-point as below. Why distilling prioritized paths improves architecture rating? The more sufficient/full training of subnets leads to a more accurate architecture rating [6](Sec.4.3). The set used to train the matching network? We will revise the manuscript to make this point clearer.